MOHAQ: Multi-Objective Hardware-Aware Quantization of recurrent neural networks

نویسندگان

چکیده

The compression of deep learning models is fundamental importance in deploying such to edge devices. selection parameters can be automated meet changes the hardware platform and application. This article introduces a Multi-Objective Hardware-Aware Quantization (MOHAQ) method, which considers performance inference error as objectives for mixed-precision quantization. proposed method feasibly evaluates candidate solutions large search space by relying on two steps. First, post-training quantization applied fast solution evaluation (inference-only search). Second, we propose ”beacon-based search” retrain selected only use them beacons estimate effect retraining other solutions. We speech recognition TIMIT dataset. Experimental evaluations show that Simple Recurrent Unit (SRU)-based compressed up 8x without any significant increase. On SiLago, found achieve 97% 86% maximum possible speedup energy saving, with minor increase an SRU-based model. Bitfusion, beacon-based reduced gain inference-only Light Gated (LiGRU)-based model 4.9 3.9 percentage points, respectively.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Alternating Multi-bit Quantization for Recurrent Neural Networks

Recurrent neural networks have achieved excellent performance in many applications. However, on portable devices with limited resources, the models are often too large to deploy. For applications on the server with large scale concurrent requests, the latency during inference can also be very critical for costly computing resources. In this work, we address these problems by quantizing the netw...

متن کامل

Effective Quantization Methods for Recurrent Neural Networks

Reducing bit-widths of weights, activations, and gradients of a Neural Network can shrink its storage size and memory usage, and also allow for faster training and inference by exploiting bitwise operations. However, previous attempts for quantization of RNNs show considerable performance degradation when using low bit-width weights and activations. In this paper, we propose methods to quantize...

متن کامل

Effective Quantization Approaches for Recurrent Neural Networks

Deep learning, Recurrent Neural Networks (RNN) in particular have shown superior accuracy in a large variety of tasks including machine translation, language understanding, and movie frames generation. However, these deep learning approaches are very expensive in terms of computation. In most cases, Graphic Processing Units (GPUs) are in used for large scale implementations. Meanwhile, energy e...

متن کامل

Recurrent Neural Networks Hardware Implementation on FPGA

Recurrent Neural Networks (RNNs) have the ability to retain memory and learn data sequences. Due to the recurrent nature of RNNs, it is sometimes hard to parallelize all its computations on conventional hardware. CPUs do not currently offer large parallelism, while GPUs offer limited parallelism due to sequential components of RNN models. In this paper we present a hardware implementation of Lo...

متن کامل

Label-Dependencies Aware Recurrent Neural Networks

In the last few years, Recurrent Neural Networks (RNNs) have proved effective on several NLP tasks. Despite such great success, their ability to model sequence labeling is still limited. This lead research toward solutions where RNNs are combined with models which already proved effective in this domain, such as CRFs. In this work we propose a solution far simpler but very effective: an evoluti...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Journal of Systems Architecture

سال: 2022

ISSN: ['1383-7621', '1873-6165']

DOI: https://doi.org/10.1016/j.sysarc.2022.102778